Delay and Cooperation in Nonstochastic Bandits

نویسندگان

  • Nicolò Cesa-Bianchi
  • Claudio Gentile
  • Yishay Mansour
  • Alberto Minora
چکیده

We study networks of communicating learning agents that cooperate to solve a common nonstochastic bandit problem. Agents use an underlying communication network to get messages about actions selected by other agents, and drop messages that took more than d hops to arrive, where d is a delay parameter. We introduce EXP3-COOP, a cooperative version of the EXP3 algorithm and prove that with K actions and N agents the average per-agent regret after T rounds is at most of order √( d+ 1 + KN α≤d ) (T lnK), where α≤d is the independence number of the d-th power of the communication graphG. We then show that for any connected graph, for d = √ K the regret bound is K √ T , strictly better than the minimax regret √ KT for noncooperating agents. More informed choices of d lead to bounds which are arbitrarily close to the full information minimax regret √ T lnK when G is dense. When G has sparse components, we show that a variant of EXP3-COOP, allowing agents to choose their parameters according to their centrality in G, strictly improves the regret. Finally, as a by-product of our analysis, we provide the first characterization of the minimax regret for bandit learning with delay.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bandit Regret Scaling with the Effective Loss Range

We study how the regret guarantees of nonstochastic multi-armed bandits can be improved, if the effective range of the losses in each round is small (e.g. the maximal difference between two losses in a given round). Despite a recent impossibility result, we show how this can be made possible under certain mild additional assumptions, such as availability of rough estimates of the losses, or adv...

متن کامل

Analyzes of the effects and security function social capital in sustainable rural border areas the villages of the central city of Saravan

The objective study was to investigate the effects of social capital on sustainability security in the villages the border areas. Statistical population including the villages of the central city of Saravan are heads of households (N= 9946). 421 households (23 villages) using Cochran formula and simple random sampling were selected. for the analysis data, descriptive and inferential statistics ...

متن کامل

Bandits with Delayed, Aggregated Anonymous Feedback

We study a variant of the stochastic K-armed bandit problem, which we call “bandits with delayed, aggregated anonymous feedback”. In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed. Instead, at the end of each round the player observes only the sum of a number of previously generated rewards which happen to arrive in the given round. The...

متن کامل

Following the Perturbed Leader to Gamble at Multi-armed Bandits

Following the perturbed leader (fpl) is a powerful technique for solving online decision problems. Kalai and Vempala [1] rediscovered this algorithm recently. A traditional model for online decision problems is the multi-armed bandit. In it a gambler has to choose at each round one of the k levers to pull with the intention to minimize the cumulated cost. There are four versions of the nonstoch...

متن کامل

Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

We present and study a partial-information model of online learning, where a decision makerrepeatedly chooses from a finite set of actions, and observes some subset of the associated losses.This naturally models several situations where the losses of different actions are related, andknowing the loss of one action provides information on the loss of other actions. Moreover, it<l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016